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Kozakia baliensis belongs to the family Acetobacteraceae and was described for the first time in 2002. These acetic acid bacteria are able 
to produce acetic acid from various carbon sources and 2- and 5-keto-D-gluconate from glucose. The novel K. baliensis strain SR-745 
was isolated from a pineapple fruit bought in a German supermarket. The strain produces large amounts of organic acids when grown 
on glucose-containing medium and accepts also glycerol, fructose, mannitol, and sucrose as a C source. When grown under light and 
high-oxygen conditions in submerged culture, the production of a pink pigment is observed after 72 h. 
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IXozakia baliensis SR-745 was obtained by classical microbiolog- 
l\ ical isolation techniques from a pineapple bought in a German 
supermarket. The strain was identified as a Kozakia strain by 16S 
rRNA analysis. A BLAST analysis indicated that K. baliensis NBRC 
16679 (100% identity in 16S rRNA) is its closest neighbor (1). The 
whole-genome shotgun sequence of K. baliensis SR-745 was ob- 
tained by one Illumina MiSeq run and one Illumina GAIIx run 
independently performed in Germany and Japan, respectively, 
based on the same DNA sample. The genomic DNA of K. baliensis 
SR-745 was obtained using an adapted method of Chen and Kuo 
(2), and shearing and library preparation were done in accordance 
with the Illumina TruSeq DNA sample preparation guide version 
2(3); the only noteworthy deviation is the substitution of the step 
"purify ligation products (gel method only)" with "purify cDNA 
construct" from the TruSeq small RNA sample preparation guide 
(4) for the sequencing in Germany. 

The sequencing in Germany yielded 1,211,830 paired-end 
reads, with read lengths ranging from 35 bp to 151 bp (median, 
150 bp). The sequencing in Japan yielded 26,561,095 single reads, 
with an average length of 109 bp. The reads from Germany were 
trimmed and quality filtered using TrimmingReads.pl (5), cut- 
adapt (6), and DynamicTrim.pl (7). Quality assessment was 
done using FastQC (http://www.bioinformatics.babraham.ac.uk/ 
projects/fastqc/) and SolexaQA (7). After processing, 759,574 for- 
ward and 919,579 reverse reads remained, with 1,145,218 paired 
reads and 533,935 singletons. The Japanese reads were trimmed 
and quality filtered using CLC Genomics Workbench version 
6.5.1, and subsequently, TrimmingReads.pl and Dynamic- 
Trim. pi. After processing, 25,849,107 reads remained. Assemblies 
of the combined data from Germany and Japan were carried out 
using Velvet 1.2.08 (8). All fc-mer values from 15 to 83 were exam- 
ined. The assembly using a fc-mer length of 55 yielded the highest 
total sequence length (3,172,521 bp; N 50 , 92,060 bp, without the 
PhiX contig) and was used for further analyses. The G + C content 



of the assembled contigs is 57.5%. The draft genome is made up of 
106 scaffolds, which are composed of 1 14 contigs. Annotation was 
carried out by uploading the generated scaffolds to RAST (9), 
which found 3,151 coding sequences comprising 1,390 genes in 
several subsystems. A total number of 48 RNA genes were detected 
by RAST analysis. Fifty- five genes were identified in the monosac- 
charide subsystem, including those for the utilization of ribose 
and xylose and the metabolism of mannose, galactose, and glu- 
conate and ketogluconate. 

Nucleotide sequence accession numbers. This whole-genome 
shotgun project has been deposited at DDBJ/EMBL/GenBank un- 
der the accession no. JNAB00000000. The version described in 
this paper is version JNAB01000000. 
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